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Abstract 

Studies on ensemble methods for classification suffer from the difficulty 
of modeling the complementary strengths of the components. Kleinberg's 
theory of stochastic discrimination (SD) addresses this rigorously via math- 
ematical notions of enrichment, uniformity, and projectability of an ensem- 
ble. We explain these concepts via a very simple numerical example that 
captures the basic principles of the SD theory and method. We focus on 
a fundamental symmetry in point set covering that is the key observation 
leading to the foundation of the theory. We believe a better understanding 
of the SD method will lead to developments of better tools for analyzing 
other ensemble methods. 

1 Introduction 

Methods for classifier combination, or ensemble learning, can be divided into 
two categories: 1) decision optimization methods that try to obtain consensus 
among a given set of classifiers to make the best decision; 2) coverage optimization 
methods that try to create a set of classifiers that work best with a fixed decision 
combination function to cover all possible cases. 

Decision optimization methods rely on the assumption that the given set of 
classifiers, typically of a small size, contain sufficient expert knowledge about 
the application domain, and each of them excels in a subset of all possible input. 
A decision combination function is chosen or trained to exploit the individual 
strengths while avoiding their weaknesses. Popular combination functions in- 
clude majority/plurality votes[19], sum/product rules[14], rank/confidence score 
combination [12], and probabilistic methods [13]. While numerous successful ap- 
plications of these methods have been reported, the joint capability of the clas- 
sifiers sets an intrinsic limitation on decision optimization that the combination 
functions cannot overcome. A challenge in this approach is to find out the "blind 
spots" of the ensemble and to obtain a classifier that covers them. 

Coverage optimization methods use an automatic and systematic mechanism 
to generate new classifiers with the hope of covering all possible cases. A fixed, 
typically simple, function is used for decision combination. This can take the form 
of training set subsampling, such as stacking [22], bagging[2], and boosting[5], 
feature subspace projection[10], superclass/subclass decomposition[4], or other 
forms of random perturbation of the classifier training procedures [6]. Open 



questions in these methods are 1) how many classifiers are enough? 2) what 
kind of difii'erences among the component classifiers yields the best combined 
accuracy? 3) how much limitation is set by the form of the component classifiers? 

Apparently both categories of ensemble methods run into some dilemma. 
Should the component classifiers be weakened in order to achieve a stronger 
whole? Should some accuracy be sacrificed for the known samples to obtain 
better generalization for the unseen cases? Do we seek agreement, or diff'erences 
among the component classifiers? 

A central difiiculty in studying the performance of these ensembles is how 
to model the complementary strengths among the classifiers. Many proofs rely 
on an assumption of statistical independence of component classifiers' decisions. 
But rarely is there any attempt to match this assumption with observations of 
the decisions. Often, global estimates of the component classifiers' accuracies 
are used in their selection, while in an ensemble what matter more are the local 
estimates, plus the relationship between the local accuracy estimates on samples 
that are close neighbors in the feature space. ^ 

Deeper investigation of these issues leads back to three major concerns in 
choosing classifiers: discriminative power, use of complementary information, and 
generalization power. A complete theory on ensembles must address these three 
issues simultaneously. Many current theories rely, either explicitly or implicitly, 
on ideal assumptions on one or two of these issues, or have them omitted entirely, 
and arc therefore incomplete. 

Kloinbcrg's theory and method of stochastic discrimination (SD)[15] [16] is 
the first attempt to explicitly address these issues simultaneously from a mathe- 
matical point of view. In this theory, rigorous notions are made for discriminative 
power, complementary information, and generalization power of an ensemble. A 
fundamental symmetry is observed between the probability of a fixed model cov- 
ering a point in a given set and the probability of a fixed point being covered by 
a model in a given ensemble. The theory establishes that, these three conditions 
are sufficient for an ensemble to converge, with increases in its size, to the most 
accurate classifier for the application. 

Kleinberg's analysis uses a set-theoretic abstraction to remove all the algo- 
rithmic details of classifiers, features, and training procedures. It considers only 
the classifiers' decision regions in the form of point sets, called weak models, in 
the feature space. A collection of classifiers is thus just a sample from the power 
set of the feature space. If the sample satisfies a uniformity condition, i.e., if its 
coverage is unbiased for any local region of the feature space, then a symmetry is 
observed between two probabilities (w.r.t. the feature space and w.r.t. the power 
set, respectively) of the same event that a point of a particular class is covered 
by a component of the sample. Discrimination between classes is achieved by 
requiring some minimum difference in each component's inclusion of points of 
different classes, which is trivial to satisfy. By way of this symmetry, it is shown 
that if the sample of weak models is large, the discriminant function, defined 
on the coverage of the models on a single point and the class-specific differences 



^ there is more discussion on these difficulties in a recent review[8]. 



within each model, converges to poles distinct by class with diminishing variance. 

We believe that this symmetry is the key to the discussions on classifier com- 
bination. However, since the theory was developed from a fresh, original, and 
independent perspective on the problem of learning, there have not been many 
direct links made to the existing theories. As the concepts are new, the claims 
are high, the published algorithms appear simple, and the details of more sophis- 
ticated implementations are not known, the method has been poorly understood 
and is sometimes referred to as mysterious. 

It is the goal of this lecture to illustrate the basic concepts in this theory and 
remove the apparent mystery. We present the principles of stochastic discrim- 
ination with a very simple numerical example. The example is so chosen that 
all computations can be easily traced step-by-step by hand or with very simple 
programs. Wc use Klcinbcrg's notation wherever possible to make it easier for 
the interested readers to follow up on the full theory in the original papers. The 
emphasis in this note is on explaining the concepts of uniformity and enrichment, 
and the behavior of the discriminant when both conditions arc fulfilled. For the 
details of the mathematical theory and outlines of practical algorithms, please 
refer to the original publications[15][16][17] [18]. 

2 Symmetry of Probabilities Induced by Uniform Space 
Covering 

The SD method is based on a fundamental symmetry in point set covering. To 
illustrate this symmetry, we begin with a simple observation. Consider a set 
iS* = {a, 6, c} and all the subsets with two elements si = {a, b}, S2 = {a, c}, and 
S3 = {b, c}. By our choice, each of these subsets has captured 2/3 of the elements 
of S. We call this ratio r. Let us now look at each member of S, and check how 
many of these three subsets have included that member. For example, a is in 
two of them, so we say a is captured by 2/3 of these subsets. We will obtain 
the same value 2/3 for all elements of S. This value is the same as r. This is 
a consequence of the fact that we have used all such 2-member subsets and we 
have not biased this collection towards any element of S. With this observation, 
we begin a larger example. 

Consider a set of 10 points in a one-dimensional feature space F. Let this 
set be called A. Assume that F contains only points in A and nothing else. Let 
each point in A be identified as go, Qi, 99 as follows. 



qo qi 92 93 94 95 96 97 98 99 

Now consider the subsets of F. Let the collection of all such subsets be 
A4, which is the power set of F. Wc call each member m of a model, and 
we restrict our consideration to only those models that contain 5 points in A, 
therefore each model has a size that is 0.5 of the size of A. Let this set of models 
be called Mo.5,a- Some members of Mo.5,a are as follows. 



{go, gi, (72, (73, ?4 } 
Uo, qi, 92, 93, 95 } 

{90 , 91, 92 , 93 , 96 } 

There are C(10, 5) = 252 members in Mq^^^a- Let M be a pseudo-random per- 
mutation of members in Mo.5,a as listed in Table 1 in the Appendix. We identify 
models in this sequence by a single subscript such that M = toi,TO2, •••,TO252- 
We expand a collection Mt by including more and more members of Mo.5,a 
in the order of the sequence M as follows. Mi = {mi}, M2 = {TOi,m2}, 
Mt = {mi, 1712, ■■■mt}. 

Since each model covers some points in A, for each member q in A, we can 
count the number of models in Mt that include 9, call this count N{q, Mt), and 
calculate the ratio of this count over the size of Mt, call it Y{q,Mt)^ That is, 
Y{q,Mt) = ProbMiQ S m\m G Mt). As Mt expands, this ratio changes and we 
show these changes for each q in Table 2 in the Appendix. The values of Y{q, Mt) 
are plotted in Figure 1. As is clearly visible in the Figure, the values of Y{q, Mt) 
converge to 0.5 for each g. Also notice that because of the randomization, we 
have expanded Mt in a way that Mt is not biased towards any particular q, 
therefore the values of Y{q, Mt) are similar after Mt has acquired a certain size 
(say, when t = 80). When Mt=Mo.5^yi, every point q is covered by the same 
number of models in Mt, and their values of Y{q, Mt) are identical and is equal 
to 0.5, which is the ratio of the size of each m relative to A (recall that we always 
include 5 points from A in each m). 

Formally, when t = 252, Mt = Mq.s^a, from the perspective of a fixed 9, the 
probability of it being contained in a model m from Mt is 

ProbMiq G m\m € Mo.5,^) — 0.5. 

We emphasize that this probability is a measure in the space A4 by writing the 
probability as ProbM ■ On the other hand, by the way each m is constructed, we 
know that from the perspective of a fixed m, 

Probpiq e m\q G A) = 0.5. 

Note that this probability is a measure in the space F. We have shown that 
these two probabilities, w.r.t. two different spaces, have identical values. In other 
words, let the membership function of m be Cm{q), i.e., Cm{q) = 1 iff g £ m, the 
random variables XqCm (9) and XmCm (9) have the same probability distribution, 
when q is restricted to A and m is restricted to Mo.5,^. This is because both 
variables can have values that are either 1 or 0, and they have the value 1 with 
the same probability (0.5 in this case). This symmetry arises from the fact that 
the collection of models Mq.s,^ covers the set A uniformly, i.e., since we have 
used all members of Mo. 5, a, each point q have the same chance to be included 
in one of these models. If any two points in a set S have the same chance to 
be included in a collection of models, we say that this collection is ^-uniform. 
It can be shown, by a simple counting argument, that uniformity leads to the 



symmetry of ProbM{q G m|m G ^/q.s.a) and Probp{q £ m\q G A), and hence 
distributions of \qCm{q) and XmCm{q). 

The observation and utihzation of this duahty are central to the theory of 
stochastic discrimination. A critical point of the SD method is to enforce such a 
uniform cover on a set of points. That is, to construct a collection of models in 
a balanced way so that the uniformity (hence the duality) is achieved without 
exhausting all possible models from the space. 
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Fig. 1. Plot of Y{q,Mt) versus t. Each line represents the trace of Y{q,Mt) for a 
particular q as Mt expands. 



3 Two-Class Discrimination 

Let us now label each point g in ^ by one of two classes ci (marked by "x" ) and 
C2 (marked by "o") as follows. 

xxxooooxxo 

go qi q2 q3 qi qs q& qi qs 99 

This gives a training set TRi for each class c^. In particular, 
TRi = {qo,qi,q2,q7,q8}, 

and 

TR2 = {93, 94, 95, 96, gg}- 




How can we build a classifier for ci and C2 using models from Mq.s^a? First, we 
evaluate each model m by how well it has captured the members of each class. 
Define ratings r, (z = 1, 2) for each m as 

ri{m) — Prohp{q S m\q € TRi). 

For example, consider model mi = {(73, 95 , , 98 1 99 } i where is in TRi and the 
rest are in TR2. TRi has 5 members and 1 is in toi, therefore ri(mi) = 1/5 = 0.2. 
TR2 has (incidentally, also) 5 members and 4 of them are in mi, therefore 
7'2(toi) = 4/5 = 0.8. Thus these ratings represent the quality of the models as a 
description of each class. A model with a rating 1.0 for a class is a perfect model 
for that class. We call the difference between n and r2 the degree of enrichment 
of m with respect to classes (1,2), i.e., d\2 = ri — r2. A model m is enriched if 
di2 7^ 0. Now we define, for all enriched models m, 

^ , X Cm{q) -r2(m) 
ri(m) — r2(m) 

and let Xi2{q, m) be if di2{m) = 0. For a given m, ri and r2 are fixed, and the 
value of X(q, m) for each g in A can have one of two values depending on whether 
q is in m. For example, for mi, ri = 0.2 and r2 = 0.8, so X{q,m) = —1/3 for 
points 53 , , , 98 , 99 , and X{q,m) = 4/3 for points 9o, 9i, 92) 94, 97- Next, for 
each set = {mi,m2, ...,mt}, we define a discriminant 



1 * 

112(9, Mt) = -Xl^i2(9,mfe). 



fe=i 

As the set expands, the value of F12 changes for each q. We show, in 
Table 3 in the Appendix, the values of yi2 for each Mt and each 9, and for each 
new member m.i of Mt, ri,r2, and the two values of Xi2- The values of F12 for 
each q are plotted in Figure 2. 

In Figure 2 we see two separate trends. All those points that belong to class 
ci have their Y12 values converging to 1.0, and all those in C2 converging to 0.0. 
Thus yi2 can be used with a threshold to classify an arbitrary point 9. We can 
assign 9 to class ci if 1^2(9, Mj) > 0.5, and to class C2 if ^12(9, M^) < 0.5, and 
remain undecided when ^12(9, Mj) = 0.5. Observe that this classifier is fairly 
accurate far before has expanded to the full set Mq.s^a- We can also change 
the two poles of yi2 to 1.0 and -1.0 respectively by simply rescaling and shifting 
-^12: 

Xi2{q,m) = 2(— -^) - 1. 

ri (m) — r2(m) 

How did this separation of trends happen? Let us now take a closer look at 
the models in each Mt and see how many of them cover each point 9. For a 
given Mt, among its members, there can be different values of ri and r2. But 
because of our choices of the sizes of TRi, TR2, and m, we have only a small 
set of distinct values that ri and r2 can have. Namely, since each model has 5 
points, there are only six possibilities as follows. 
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Fig. 2. Plot of Yi2{q, Mt) versus t. Each line represents the trace of Yi2{q, Mt) for a 
particular q as Alt expands. 



no. of points from TRi 1 2 3 4 5 
no. of points from TR2 5 4 3 2 1 
ri 0.0 0.2 0.4 0.6 0.8 1.0 

ra 1.0 0.8 0.6 0.4 0.2 0.0 



Note that in a general setting ri and r2 do not have to sum up to 1. If we 
included models of a larger size, say, one with 10 points, we can have both ri 
and r2 equal to 1.0. We have simplified matters by using models of a fixed size 
and training sets of the same size. According to the values of ri and r2, in this 
case we have only 6 different kinds of models. 

Now we take a detailed look at the coverage of each point q by each kind 
of models, i.e., models of a particular rating (quality) for each class. Let us 
count how many of the models of each value of ri and r2 cover each point q, 
and call this NMt,ri.TRi{q) and NMt,r2,TR2{q) respectively. We can normalize 
this count by the number of models having each value of ri or r2, and obtain 
a ratio fMt,ri.,TRi{q) and fMt.r2,TR2{l) respectively. Thus, for each point q, we 
have "a profile of coverage" by models of each value of ratings ri and r2 that is 
described by these ratios. For example, point go at < = 10 is only covered by 5 
models (m2, ms, tos, mg, wio) in Mio, and from Table 3 we know that Mio has 
various numbers of models in each rating as summarized in the following table. 



n 0.0 0.2 0.4 0.6 0.8 1.0 

no. of models in Mio with n 2 2 4 2 

NM,^,r„TRMo) 3 2 

/Mio,ri,Tfli(90) 0.75 1.0 

ra 0.0 0.2 0.4 0.6 0.8 1.0 

no. of models in Mio with r2 2 4 2 2 

NM,o,r^,TRMo) 2 3 

/Mio,r2,Tfl.(90) 1.0 0.75 

We show such profiles for each point q and each set Mj in Figure 3 (as a 
function of ri) and Figure 4 (as a function of ra) respectively. 

Observe that as t increases, the profiles of coverage for each point q converge 
to two distinct patterns. In Figure 3, the profiles for points in TRi converge 
to a diagonal fMt,ri,TRi = r\, and in Figure 4, those for points in TR2 also 
converge to a diagonal fMt.r2.TR2 = 1^2 ■ That is, when Mt — Mq.^^a, wc have 
for all q in TRi and for all ri, ProbM{Q G m\m G Mn^TRi) = ri, and for all 
q in TR2 and for all r2, ProbMiQ G m\m G Mr2,TR2) = ''2- Thus we have the 
symmetry in place for both TRi and TR2. This is a consequence of Mt being 
both Ti?i-uniform and Ti?2-uniform. 

The discriminant Yi2{q, Mt) is a summation over all models m in Mt, which 
can be decomposed into the sums of terms corresponding to different ratings 
for either i = 1 or i = 2. To understand what happens with the points in TRi, 
we can decompose their Y12 by values of ri . Assume that there are models in 
Mt that have ri = x. Since we have only 6 distinct values for x, Mt is a union 
of 6 disjoint sets, and Y12 can be decomposed as 



Yi2{q,Mt) = ^-fL[_L_J2'^^^^^^,XM<l-rn.k„„)] + iiL^[^ ^^^(g, m,„ J] 

¥[7^El';:=i^i2(g,mfe„,J] + i^[^^*-;^^Xi2(g,mfc„J] 
¥[i;r^Elrs=i^i2(9,mfe„J] + lJjii[^2^-^^Xi2(9,m,,J]. 



The factor in the square bracket of each term is the expectation of values 
of X12 corresponding to that particular rating r\ = x. Since ri is the same for 
all m contributing to that term, by our choice of sizes of TRi, TR2, and the 
models, r2 is also the same for all those m relevant to that term. Let that value 
of r2 be y, we have, for each (fixed) q, each value of x and the associated value 

y, 

E{Xi2{q,mx)) = E[ ) = = = 1. 

X — y X — y X — y 

The second to the last equality is a consequence of the uniformity of Mf. 

because the collection Mt (when t = 252) covers TRi uniformly, we have for 
each value x, ProbMil G m\m S M^^tRi) = x, and since Cm^{q) has only two 
values (0 or 1), and (g) = 1 iff g e m, we have the expected value of {q) 
equal to x. Therefore 



Yvi{q,Mt) = = 1. 

In a more general case, the values of ri are not necessarily equal for all models 
with the same value for ri, so we cannot take y and x — y out as constants. But 
then we can further split the term by the values of r2, and proceed with the 
same argument. 

A similar decomposition of Y12 into terms corresponding to different values 
of r2 will show that ^12(9, Mt) = for those points in TR2. 



4 Projectability of Models 

We have built a classifier and shown that it works for TRi and Ti?2- How can 
this classifier work for an arbitrary point that is not in TRi or TR2! Suppose 

that the feature space F contains other points p (marked by ","), and that each 
p is close to some training point q (marked by ".") as follows. 



90, PO 91, Pi 92,^2 93,^3 94, P4 95, P5 96, P6 97,^7 98, PS 99, P9 

We can take the models m as regions in the space that cover the points q 
in the same manner as before. Say, if each point qi has a particular value of 
the feature v (in our one- dimensional feature space) that is v{qi). We can define 
a model by ranges of values for this feature, e.g., in our example mi covers 
93, 95, 96, 98, 99, so we take 

< < £(2a)+£M}u 

{9|^^^^^4^<t^(9)}. 

Thus we can tell if an arbitrary point p with value v{p) for this feature is 
inside or outside this model. 

We can calculate the model's ratings in exactly the same way as before, using 
only the points q. But now the same classifier works for the new points p, since we 
can use the new definitions of models to determine if p is inside or outside each 
model. Given the proximity relationship as above, those points will be assigned 
to the same class as their closest neighboring q. If these are indeed the true classes 
for the points p, the classifier is perfect for this new set. In the SD terminology, 
if we call the two subsets of points p that should be labeled as two different 
classes TEi and TE2, i.e., TEi = {po,Pi,P2,P7,P8}, TE2 = {P3,P4,P5,P6,P9}, 
we say that TRi and TEi are Mt-indiscernible, and similarly TR2 and TE2 
are also Mt-indiscernible. This is to say, from the perspective of Mj, there is no 
difference between TRi and TEi, or TR2 and TE2, therefore all the properties 
of Mf that are observed using TRi and TR2 can be projected to TEi and TE2. 
The central challenge of an SD method is to maintain projectability, uniformity, 
and enrichment of the collection of models at the same time. 



5 Developments of SD Theory and Algorithms 



5.1 Algorithmic Implementations 

The method of stochastic discrimination constructs a classifier by combining a 
large number of simple discriminators that are called weak models. A weak model 
is simply a subset of the feature space. Conceptually, the classifier is constructed 
by a three-step process: (1) weak model generation, (2) weak model evaluation, 
and (3) weak model combination. The generator enumerates weak models in an 
arbitrary order and passes them on to the evaluator. The evaluator has access to 
the training set. It rates and filters the weak models according to their capability 
in capturing points of each class, and their contribution to satisfying the uni- 
formity condition. The combinator then produces a discriminant function that 
depends on a point's membership status with respect to each model, and the 
models' ratings. At classification, a point is assigned to the class for which this 
discriminant has the highest value. Informally, the method captures the intuition 
of gaining wisdom from graded random guesses. 

Weak model generation. Two guidelines should be observed in generating 
the weak models: 

(1) projectability: A weak model should be able to capture more than one 
point so that the solution can be protectable to points not included in the training 
set. Geometrically, this means that a useful model must be of certain minimum 
size, and it should be able to capture points that are considered neighbors of 
one another. To guarantee similar accuracies of the classifier (based on similar 
ratings of the weak models) on both training and testing data, one also needs 
an assumption that the training data are representative. Data representativeness 
and model projectability are two sides of the same question. More discussions of 
this can be found in [1]. A weak model defines a neighborhood in the space, and 
we need a training sample in a neighborhood of every unseen sample. Otherwise, 
since our only knowledge of the class boundaries is from the given training set, 
there can be no basis for any inference concerning regions of the feature space 
where no training samples are given. 

(2) simplicity of representation: A weak model should have a simple repre- 
sentation. That means, the membership of an arbitrary point with respect to a 
model must be cheaply computable. To illustrate this, consider representing a 
model as a listing of all the points it contains. This is practically useless since 
the resultant solution could be as expensive as an exhaustive template matching 
using all the points in the feature space. An example of a model with a simple 
representation is a half-plane in a two-dimensional feature space. 

Conditions (1) and (2) restrict the type of weak models yet by no means 
reduce the number of candidates to any tangible limit. To obtain an unbiased 
collection of the candidates with minimum effort, random sampling with replace- 
ment is useful. The training of the method thus relies on a stochastic process 
which, at each iteration, generates a weak model that satisfies the above condi- 
tions. 



A convenient way to generate weak models randomly is to use a type of 
model that can be described by a small number of parameters. The values of the 
parameters can be chosen pseudo-randomly. Some example types of models that 
can be generated this way include (1) half-spaces bounded by a threshold on a 
randomly selected feature dimension; (2) half-spaces boTinded by a hyperplane 
of equi-distance to two randomly selected points; (3) regions bounded by two 
parallel hypcrplancs perpendicular to a randomly selected axis. (4) hypcrcubcs 
centered at randomly selected points with edges of varying lengths; (5) balls 
(based on the city-block metric) centered at randomly selected points with ran- 
domly selected radii; and (6) balls (based on the Euclidean metric) centered at 
a randomly selected points with randomly selected radii. A model can also be a 
union or intersection of several regions of these types. An implementation of SD 
using hyper-rectangular boxes as weak models is described in [9] . 

A number of heuristics may be used in creating these models. These heuristics 
specify the way random points are chosen from the space, or set limits on the 
maximum and minimum sizes of the models. By this we mean restricting the 
choice of random points to, for instance, points in the space whose coordinates 
fall inside the range of those of the training samples, or restricting the radii 
of the balls to, for instance, a fraction of the range of values in a particular 
feature dimension. The purpose of these heuristics is to speed up the search for 
acceptable models by confining the search within the most interesting regions, 
or to guarantee a minimum model size. 

Enrichment enforcement. The enrichment condition is relatively easy to 
enforce, as models biased towards one class are most common. But since the 
strength of the biases {\dij{m)\) determines the rate at which accuracy increases, 
we tend to prefer to use models with an enrichment degree further away from 
zero. 

One way to implement this is to use a threshold on the enrichment degree to 

select weak models from the random stream so that they are of some minimum 
quality. In this way, one will be able to use a smaller collection of models to yield 
a classifier of the same level of accuracy. However, there are tradeoffs involved 
in doing this. For one thing, models of higher rating are less likely to appear in 
the stream, and so more random models have to be explored in order to find 
sufficient numbers of higher quality weak models. And once the type of model is 
fixed and the value of the threshold is set, there is a risk that such models may 
never be found. 

Alternatively, one can use the most enriched model found in a pre-determined 
number of trials. This also makes the time needed for training more predictable, 
and it permits a tradeoff between training time and quality of the weak models. 

In enriching the model stream, it is important to remember that if the quality 
of weak models selected is allowed to get too high, there is a risk that they will 
become training set specific, that is, less likely to be projectable to unseen sam- 
ples. This could present a problem since the projectability of the final classifier 
is directly based on the projectability of its component weak models. 



Uniformity promotion. The uniformity condition is much more difficult to 
satisfy. Strict uniformity requires that every point be covered by the same num- 
ber of weak models of every combination of per-class ratings. This is rather 
infeasible for continuous and unconstrained ratings. 

One useful strategy is to use only weak models of a particular rating. In such 
cases, the ratings r-j (m) and rj (m) are the same for all models m enriched for the 
discrimination between classes i and j, so wc need only to make sure that each 
point is included in the same number of models. To enforce this, models can be 
created in groups such that each group partitions the entire space into a set of 
non-overlapping regions. An example is to use leaves of a fully- split decision tree, 
where each leave is perfectly enriched for one class, and each point is covered by 
exactly one leave of each tree. For any pairwise discrimination between classes i 
and j, we can use only those leaves of the trees that contain only points of class 
i. In other words, ri{m) is always 1 and rj{m) is always 0. Constraints are put 
in the tree-construction process to guarantee some minimum projectability. 

With other types of models, a first step to promote uniformity is to use 
models that are unions of small regions with simple boundaries. The component 
regions may be scattered throughout the space. These models have simple rep- 
resentations but can describe complicated class boundaries. They can have some 
minimum size and hence good projectability. At the same time, the scattered 
locations of component regions do not tend to cover large areas repeatedly. 

A more sophisticated way to promote uniformity involves defining a measure 
of the lack of uniformity and an algorithm to minimize such a measure. The goal 
is to create or retain more models located in areas where the coverage is thinner. 
An example of such a measure is the coimt of those points that arc covered by 
less-than-average number of previously retained models. For each point x in the 
class Co to be positively enriched, we calculate, out of all previous models used 
for that class, how many of them have covered x. If the coverage is less than 
the average for class Cq, we call x a weak point. When a new model is created, 
we check how many such weak points are covered by the new model. The ratio 
of the set of covered weak points to the set of all the weak points is used as 
a merit score of how well this model improves uniformity. We can accept only 
those models with a score over a pre-set threshold, or take the model with the 
best score found in a pre-set number of trials. One can go further to introduce 
a bias to the model generator so that models covering the weak points are more 
likely to be created. The later turns out to be a very effective strategy that led 
to good results in our experiments. 

5.2 Alternative Discriminants and Approximate Uniformity 

The method outlined above allows for rich possibilities of variation at the algo- 
rithmic level. The variations may be in the design of the weak model generator, 
or in ways to enforce the enrichment and uniformity conditions. It is also possible 
to change the definition of the discriminant, or to use different kinds of ratings. 

A variant of the discriminating function is studied in detail in [1]. In this 
variant, we define the ratings by 
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for all i. It is an estimate of the posterior probability that a point belongs to 
class i given the condition that it is included in model m. The discriminant for 
class i is defined to be: 



where Pi is the number of models accumulated for class i. 

It turns out that, with this discriminant, the classifier also approaches perfec- 
tion asymptotically provided that an additional symmetry condition is satisfied. 
The symmetry condition requires that the ensemble includes the same number 
of models for all permutations of (r'j, rj, r^). It prevents biases created by 
using more -enriched models than (j, z)-enriched models for all pairs 

[I] . Again, this condition may be enforced by using only certain particular per- 
mutations of the r' ratings [7]. This alternative discriminant is convenient for 
multi-class discrimination problems. 

The SD theory established the mathematical concepts of enrichment, uni- 
formity, and projectability of a weak model ensemble. Bounds on classification 
accuracy are developed based on strict requirements on these conditions, which 
is a mathematical idealization. In practice, there are often difficult tradeoffs 
among the three conditions. Thus it is important to understand how much of 
the classification performance is affected when these conditions are weakened. 
This is the subject of study in [3], where notions of near uniformity and weak 
indiscernibility are introduced and their implications are studied. 

5.3 Structured Collections of Weak Models 

As a constructive procedure, the method of stochastic discrimination depends on 
a detailed control of the uniformity of model coverage, which is outlined but not 
fully published in the literature [17]. The method of random subspaccs followed 
these ideas but attempted a different approach. Instead of obtaining weak dis- 
crimination and projectability through simplicity of the model form, and forcing 
uniformity by sophisticated algorithms, the method uses complete, locally pure 
partitions as given in fully split decision trees [10] or nearest neighbor classifiers 

[II] to achieve strong discrimination and uniformity, and then explicitly forces 
different generalization patterns on the component classifiers. This is done by 
training large capacity component classifiers such as nearest neighbors and deci- 
sion trees to fully fit the data, but restricting the training of each classifier to a 
coordinate subspace of the feature space where all the data points are projected, 
so that classifications remain invariant in the complement subspace. If there is 
no ambiguity in the subspaces, the individual classifiers maintain maximum ac- 
curacy on the training data, with no cases deliberately chosen to be sacrificed, 
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and thus the method does not run into the paradox of sacrificing some train- 
ing points in the hope for better generahzation accuracy. This is to create a 
collection of weak models in a structured way. 

However the tension among the three factors persists. There is another dif- 
ficult tradeoff in how much discriminating power to retain for the component 
classifiers. Can every one use only a single feature dimension so as to maxi- 
mize invariance in the complement dimensions? Also, projection to coordinate 
subspaces sets parts of the decision boundaries parallel to the coordinate axes. 
Augmenting the raw features by simple transformations [10] introduces more 
flexibility, but it may still be insufficient for an arbitrary problem. Optimization 
of generalization performance will continue to depend on a detailed control of 
the projections to suit a particular problem. 



6 Conclusions 

The theory of stochastic discrimination identifies three and only three sufficient 

conditions for a classifier to achieve maximum accuracy for a problem. These 
are just the three elements long believed to be important in pattern recognition: 
discrimination power, complementary information, and generalization ability. It 
sets a foundation for theories of ensemble learning. Many current questions on 
classifier combination can have an answer in the arguments of the SD theory: 
What is good about building the classifier on weak models instead of strong 
models? Because weak models arc easier to obtain, and their smaller capacity 
renders them less sensitive to sampling errors in small training sets [20] [21], thus 
they are more likely to have similar coverage on the unseen points from the same 
problem. Why are many models needed? Because the method relies on the law 
of large numbers to reduce the variance of the discriminant on each single point. 
How should these models complement each other? The uniformity condition 
specifies exactly what kind of correlation is needed among the individual models. 

Finally, we emphasize that the accuracy of SD methods is not achieved by 
intentionally limiting the VC dimension of the complete system; the combination 
of many weak models can have a very large VC dimension. It is a consequence 
of the symmetry relating probabilities in the two spaces, and the law of large 
numbers. It is a structural property of the topology. The observation of this sym- 
metry and its relationship to ensemble learning is a deep insight of Klcinberg's 
that we believe can lead to better understanding of other ensemble methods. 
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Appendix. 

Listing of an awk script to calculate a model's ratings. The script takes a 
file where each line contains the point indices of the elements of a model (as in 
Table 1). 

# <awk.rating> 

# run: awk -f awk. rating list_of _models > models. rating 

# input : 

# 3 5 6 8 9 
#01268 

# . . . 
BEGIN{ 
sizeTRl = 5; 
sizeTR2 = 5; 

# full space = Cxxxooooxxo) 

c[0] = 1; c[l] = 1; c[2] = 1; c [3] = 2; c [4] = 2; 
c[5] = 2; c[6] = 2; c [7] = 1; c [8] = 1; c [9] = 2; 
} 

{ coverl = 0; 
cover2 = 0; 
sizeM = NF; 

for (i=l; i<=NF; ++i) if (c[$i] == 1) coverl ++; else cover2 ++; 
rl = coverl/sizeTRl; 
r2 = cover2/sizeTR2; 
enr = rl - r2; 

if (enr ==0.0) { xin = 0.0; xout = 0.0;} 
else { 

xin = (1.0 - r2)/enr; 

xout = (0.0 - r2)/enr; 

} 

printf ''model '/.s rl 7..2f r2 '/,.2f enr = rl-r2 = '/..2f xin '/,.2f xout '/,.2f\n",\ 
$0,rl , r2, enr , xin, xout ; 

y 

Listing of an awk script to calculate the discriminant Y12 for each point q, 
given a list of models and their ratings (output of awk. rating). 

# <awk . discrim> 

# run: awk -f awk.discrim models. rating 

# input: 

# model 3 5 6 8 9 rl 0.20 r2 0.80 enr = rl-r2 = -0.60 xin -0.33 xout 1.33 

# model 1 2 6 8 rl 0.80 r2 0.20 enr = rl-r2 = 0.60 xin 1.33 xout -0.33 

# . . . 
{ 

nm++ ; 

xin = $17; 
xout = $19; 

# 

# check for each point q, whether model m (this row) covers q, 

# and increment y[q] accordingly 
# 

for (q=0; q<10;++q) { 
inmodel = 0; 
for (i=2; i<=6; ++i) { 

if ($i == q) inmodel = 1; 

} 

if (inmodel) { 

y[q] = (y[q]*(nm-l) + xin)/nm; } 
else i 

y[q] = (y[q]*(nm-l) + xout)/nm;)- 

} 

printf "Y(M,q)_(IMI= 7.3d ) " , nm; 

for (q=0; q<10;++q) { printf "•/.5.2f ", y[q];} 

printf ''\n''; 

} 



Table 1. Models mt in Mo. 5, a in the order of M = mi, m2, "^252. Each model 
is shown with its elements denoted by the indices i of qi in A. For example, 

mi = {g3,?5,Q6,?8,(79}- 



nit 


elements 


mt 


elements 


mt 


elements 


TOt 


elements 


nit 


elements 


TOf 


elements 


nil 


35689 


m43 


12689 


rrisr, 


24578 


TO127 


01469 


TO 169 


02468 


TO211 


02458 


rn2 


01268 


m44 


04569 


rrise 


23568 


TO128 


03679 


TO 170 


35678 


TO212 


13457 


m-A 


04789 


m45 


01245 


rnsr 


01267 


TO129 


04579 


"1171 


03589 


TO213 


24689 


rriA 


25689 


m46 


01458 


rnss 


01257 


TO 130 


01237 


TO 172 


34679 


TO214 


03478 


nis 


02679 


m47 


15679 


ms9 


05679 


TO131 


24789 


TO 173 


12346 


TO215 


23589 


me 


34578 


m48 


12457 


mgo 


24589 


TO132 


45689 


mi74 


12458 


TO2I6 


24679 


mr 


13459 


m49 


02379 


mgi 


04589 


TO133 


16789 


TO175 


35789 


TO217 


02456 


ms 


01238 


m^o 


02568 


m92 


12467 


TO 134 


13479 


TO176 


02358 


TO218 


05689 


mtj 


12347 


mr,i 


12357 


7n9S 


13578 


TO135 


02349 


TO 177 


35679 


TO219 


12789 


rnu) 


01579 


mz,2 


14678 


71794 


02369 


TO136 


13469 


TO 178 


13458 


TO22O 


02346 


mil 


34589 


mry.i 


12678 


mgs 


12469 


TO137 


03678 


TO179 


01459 


TO22I 


23489 


mi2 


03459 


77154 


23567 


ro-96 


04567 


TO 138 


23679 


TO18O 


03479 


TO222 


23467 


mis 


23459 


m^a 


02789 


m97 


14679 


TO 139 


46789 


TO18I 


14789 


TO223 


12489 


rnu 


02457 


mr,6 


24567 


TO 98 


13467 


TO 140 


01468 


TO182 


23678 


TO224 


14589 


mi5 


02368 


rrioT 


13569 


TO 99 


45678 


TO141 


03689 


TO183 


03456 


TO225 


25678 


mid 


02689 


mas 


01259 


TO 100 


03469 


TO 142 


02478 


TO 184 


13456 


TO226 


12579 


mn 


01368 


m59 


23479 


TO 101 


34789 


TO 143 


23457 


TO185 


01568 


TO227 


03458 


mis 


13589 


mao 


03579 


muy2 


45679 


TO 144 


02347 


TO 186 


01578 


TO228 


01569 


mi9 


14579 


TTlfil 


12368 


TO 103 


01358 


TO 145 


01289 


TO187 


01678 


TO229 


45789 


m2{) 


23468 


•"'162 


23578 


TO104 


01379 


TO146 


01369 


TO188 


12367 


TO230 


12358 


m2i 


26789 


mas 


02345 


TO105 


01236 


TO 147 


01356 


TO189 


12345 


TO231 


02579 


rn22 


15678 


m(j4 


01479 


TO 106 


01679 


TO 148 


12379 


TOl9() 


25679 


TO232 


01457 


mxi 


04578 


mt;-, 


03569 


TO 107 


13689 


TO149 


02569 


TO 191 


02367 


TO233 


05789 


77124 


04679 


'"166 


01346 


TO 108 


12479 


TO150 


34678 


TO192 


01256 


TO234 


01247 


m25 


02459 


m67 


24568 


TO 109 


14568 


TO 151 


24569 


TO193 


13679 


TO235 


03467 


m26 


12569 


mas 


01359 


TOiio 


15689 


TO 152 


03578 


TO 194 


04689 


TO236 


12359 


m27 


01269 


•"l69 


12459 


TOiii 


01258 


TO153 


02359 


TO195 


04568 


TO237 


02567 


m2H 


06789 


"170 


01239 


TO112 


12389 


TO154 


01234 


TO196 


12578 


TO238 


12356 


m29 


01689 


mn 


24678 


TO113 


03568 


TO 155 


01345 


TO197 


12468 


TO239 


02469 


m-M) 


01248 


"172 


01347 


TO114 


23689 


TO156 


02348 


TO198 


03468 


TO240 


13468 


rn-ii 


12456 


"173 


01467 


TO115 


23478 


TO 157 


03457 


TO199 


34569 


TO241 


02479 


ms2 


13579 


"174 


04678 


TO116 


34568 


TO158 


02357 


TO2OO 


12369 


TO242 


36789 


m-iz 


34689 


"175 


12589 


TO117 


23569 


TO159 


01235 


TO2OI 


13489 


TO243 


13568 


rn-.u 


12679 


•"176 


01348 


TO118 


14689 


TO16O 


01378 


TO202 


12567 


TO244 


02467 


m-sr, 


12568 


m77 


14569 


TO119 


23789 


TO 161 


14567 


TO203 


02489 


TO245 


01589 


77136 


34579 


m78 


01789 


TO120 


01246 


TO162 


23458 


TO204 


02678 


TO246 


01478 


m-.ir 


01389 


"179 


01367 


TO121 


23579 


TO 163 


56789 


TO205 


13567 


TO247 


15789 


m-is 


23469 


"iSO 


12478 


TO122 


01456 


TO 164 


34567 


TO2O6 


01357 


TO248 


01349 


71239 


24579 


"181 


25789 


TO123 


23456 


TO165 


01249 


TO207 


01278 


TO249 


02356 


mu) 


02589 


'"t82 


01489 


TO124 


03789 


TO 166 


03489 


TO2O8 


02578 


TO250 


14578 


mn 


01567 


'"t83 


03567 


TO125 


05678 


TO167 


02389 


TO209 


12348 


TO261 


13789 


m42 


13478 


m84 


12349 


TO126 


13678 


TO168 


12378 


TO210 


01279 


TO262 


02378 



Table 



2. Ratio of coverage of each 



point q by members of Mt as Mt expands. 





9Q 


91 


92 


93 


94 


9.1 


96 


97 


98 


99 


90 


91 


92 


93 


94 


95 


9fi 


97 


98 


99 


M-i 











1 





1 


1 





1 


1 





00 





00 





00 


1 


00 





00 


1 


00 


1 


00 





00 


1 


00 


1 


00 


M2 


1 


1 


1 


1 





1 


2 





2 


1 





50 





50 





50 





50 





00 





50 


1 


00 





00 


1 


00 





50 


M3 


2 


1 


1 


1 


1 


1 


2 


1 


3 


2 





67 





33 





33 





33 





33 





33 





67 





33 


1 


00 





67 


A/4 


2 


1 


2 


1 


1 


2 


3 


1 


4 


3 





50 





25 





50 





25 





25 





50 





75 





25 


1 


00 





75 


M5 


3 


1 


3 


1 


1 


2 


4 


2 


4 


4 





60 





20 





60 





20 





20 





40 





80 





40 





80 





80 


Mg 


3 


1 


3 


2 


2 


3 


4 


3 


5 


4 





50 





17 





50 





33 





33 





50 





67 





50 





83 





67 


M7 


3 


2 


3 


3 


3 


4 


4 


3 


5 


5 





43 





29 





43 





43 





43 





57 





57 





43 





71 





71 


Ms 


4 


3 


4 


4 


3 


4 


4 


3 


6 


5 





50 





38 





50 





50 





38 





50 





50 





38 





76 





62 


M9 


4 


4 


5 


5 


4 


4 


4 


4 


6 


5 





44 





44 





56 





56 





44 





44 





44 





44 





67 





56 


A/10 


5 


5 


5 


5 


4 


5 


4 


5 


6 


6 





50 





50 





50 





50 





40 





50 





40 





50 





60 





60 


A/11 


5 


5 


5 


6 


5 


6 


4 


5 


7 


7 





45 





45 





45 





55 





45 





55 





36 





45 





64 





64 


A/12 


6 


5 


5 


7 


6 


7 


4 


5 


7 


8 





50 





42 





42 





58 





50 





58 





33 





42 





58 





67 


A/13 


6 


5 


6 


8 


7 


8 


4 


5 


7 


9 





46 





38 





46 





62 





54 





62 





31 





38 





54 





69 


A/14 


7 


5 


7 


8 


8 


9 


4 


6 


7 


9 





50 





36 





50 





57 





57 





64 





29 





43 





50 





64 


Afl6 


8 


5 


8 


9 


8 


9 


5 


6 


8 


9 





53 





33 





53 





60 





53 





60 





33 





40 





53 





60 


Mie 


9 


5 


9 


9 


8 


9 


6 


6 


9 


10 





56 





31 





56 





56 





50 





56 





38 





38 





56 





62 


Ml 7 


10 


6 


9 


10 


8 


9 


7 


6 


10 


10 





59 





35 





53 





59 





47 





53 





41 





35 





59 





59 


M18 


10 


7 


9 


11 


8 


10 


7 


6 


11 


11 





56 





39 





50 





61 





44 





56 





39 





33 





61 





61 


Mig 


10 


8 


9 


11 


9 


11 


7 


7 


11 


12 





53 





42 





47 





58 





47 





58 





37 





37 





58 





63 


*/20 


10 


8 


10 


12 


10 


11 


8 


7 


12 


12 





50 





40 





50 





60 





50 





55 





40 





35 





60 





60 


M21 


10 


8 


11 


12 


10 


11 


9 


8 


13 


13 





48 





38 





52 





57 





48 





52 





43 





38 





62 





62 


A/22 


10 


9 


11 


12 


10 


12 


10 


9 


14 


13 





45 





41 





50 





55 





45 





55 





45 





41 





64 





59 


A/23 


11 


9 


11 


12 


11 


13 


10 


10 


15 


13 





48 





39 





48 





52 





48 





57 





43 





43 





65 





57 


A/24 


12 


9 


11 


12 


12 


13 


11 


11 


15 


14 





50 





38 





46 





50 





50 





54 





46 





46 





62 





58 


A/25 


13 


9 


12 


12 


13 


14 


11 


11 


15 


15 





52 





36 





48 





48 





52 





56 





44 





44 





60 





60 


A/26 


13 


10 


13 


12 


13 


15 


12 


11 


15 


16 





50 





38 





50 





46 





50 





58 





46 





42 





58 





62 


A/27 


14 


11 


14 


12 


13 


15 


13 


11 


15 


17 





52 





41 





52 





44 





48 





56 





48 





41 





56 





63 


A/28 


15 


11 


14 


12 


13 


15 


14 


12 


16 


18 





54 





39 





50 





43 





46 





54 





50 





43 





57 





64 


A/29 


16 


12 


14 


12 


13 


15 


15 


12 


17 


19 





55 





41 





48 





41 





45 





52 





52 





41 





59 





66 


A/30 


17 


13 


15 


12 


14 


15 


15 


12 


18 


19 





57 





43 





50 





40 





47 





50 





50 





40 





60 





63 


A/31 


17 


14 


16 


12 


15 


16 


16 


12 


18 


19 





55 





45 





52 





39 





48 





52 





52 





39 





58 





61 


A/32 


17 


15 


16 


13 


15 


17 


16 


13 


18 


20 





53 





47 





50 





41 





47 





53 





50 





41 





56 





62 


A/33 


17 


15 


16 


14 


16 


17 


17 


13 


19 


21 





52 





45 





48 





42 





48 





52 





52 





39 





58 





64 


A/34 


17 


16 


17 


14 


16 


17 


18 


14 


19 


22 





50 





47 





50 





41 





47 





50 





53 





41 





56 





65 


A/35 


17 


17 


18 


14 


16 


18 


19 


14 


20 


22 





49 





49 





51 





40 





46 





51 





54 





40 





57 





63 


A/36 


17 


17 


18 


15 


17 


19 


19 


15 


20 


23 





47 





47 





50 





42 





47 





53 





53 





42 





56 





64 


A/37 


18 


18 


18 


16 


17 


19 


19 


15 


21 


24 





49 





49 





49 





43 





46 





51 





51 





41 





57 





65 


A/38 


18 


18 


19 


17 


18 


19 


20 


15 


21 


25 





47 





47 





50 





45 





47 





50 





53 





39 





55 





66 


A/39 


18 


18 


20 


17 


19 


20 


20 


16 


21 


26 





46 





46 





51 





44 





49 





51 





51 





41 





54 





67 


A/40 


19 


18 


21 


17 


19 


21 


20 


16 


22 


27 





47 





45 





53 





42 





47 





53 





50 





40 





56 





68 


A/41 


20 


19 


21 


17 


19 


22 


21 


17 


22 


27 





49 





46 





51 





41 





46 





54 





51 





41 





64 





66 


A/42 


20 


20 


21 


18 


20 


22 


21 


18 


23 


27 





48 





48 





50 





43 





48 





52 





50 





43 





66 





64 


A/43 


20 


21 


22 


18 


20 


22 


22 


18 


24 


28 





47 





49 





51 





42 





47 





51 





51 





42 





66 





65 


A/44 


21 


21 


22 


18 


21 


23 


23 


18 


24 


29 





48 





48 





50 





41 





48 





52 





52 





41 





66 





66 


A/45 


22 


22 


23 


18 


22 


24 


23 


18 


24 


29 





49 





49 





51 





40 





49 





53 





51 





40 





63 





64 


A/46 


23 


23 


23 


18 


23 


25 


23 


18 


25 


29 





50 





50 





50 





39 





50 





54 





50 





39 





64 





63 


A/47 


23 


24 


23 


18 


23 


26 


24 


19 


25 


30 





49 





51 





49 





38 





49 





55 





51 





40 





63 





64 


A/48 


23 


25 


24 


18 


24 


27 


24 


20 


25 


30 





48 





52 





50 





38 





50 





56 





50 





42 





62 





62 


A/4g 


24 


25 


25 


19 


24 


27 


24 


21 


25 


31 





49 





51 





51 





39 





49 





55 





49 





43 





61 





63 


A/50 


25 


25 


26 


19 


24 


28 


25 


21 


26 


31 





50 





50 





52 





38 





48 





56 





50 





42 





62 





62 


A/51 


25 


26 


27 


20 


24 


29 


25 


22 


26 


31 





49 





51 





53 





39 





47 





57 





49 





43 





61 





61 


A/52 


25 


27 


27 


20 


25 


29 


26 


23 


27 


31 





48 





52 





52 





38 





48 





56 





50 





44 





62 





60 


A/53 


25 


28 


28 


20 


25 


29 


27 


24 


28 


31 





47 





53 





53 





38 





47 





55 





51 





45 





63 





58 


A/54 


25 


28 


29 


21 


25 


30 


28 


25 


28 


31 





46 





52 





54 





39 





46 





56 





52 





46 





62 





57 


A/ 5 5 


26 


28 


30 


21 


25 


30 


28 


26 


29 


32 





47 





51 





55 





38 





45 





55 





51 





47 





63 





58 


A/56 


26 


28 


31 


21 


26 


31 


29 


27 


29 


32 





46 





50 





55 





38 





46 





55 





52 





48 





62 





57 


A/ 5 7 


26 


29 


31 


22 


26 


32 


30 


27 


29 


33 





46 





51 





54 





39 





46 





56 





53 





47 





61 





58 


A/58 


27 


30 


32 


22 


26 


33 


30 


27 


29 


34 





47 





52 





55 





38 





45 





57 





52 





47 





50 





59 


A/59 


27 


30 


33 


23 


27 


33 


30 


28 


29 


35 





46 





51 





56 





39 





46 





56 





51 





47 





49 





59 


A/60 


28 


30 


33 


24 


27 


34 


30 


29 


29 


36 





47 





50 





55 





40 





45 





57 





50 





48 





48 





60 


A/61 


28 


31 


34 


25 


27 


34 


31 


29 


30 


36 





46 





51 





56 





41 





44 





56 





51 





48 





49 





59 


A/62 


28 


31 


35 


26 


27 


35 


31 


30 


31 


36 





45 





50 





56 





42 





44 





56 





50 





48 





60 





58 


A/63 


29 


31 


36 


27 


28 


36 


31 


30 


31 


36 





46 





49 





57 





43 





44 





57 





49 





48 





49 





57 


Mg4 


30 


32 


36 


27 


29 


36 


31 


31 


31 


37 





47 





50 





56 





42 





45 





56 





48 





48 





48 





68 


M65 


31 


32 


36 


28 


29 


37 


32 


31 


31 


38 





48 





49 





55 





43 





45 





57 





49 





48 





48 





58 


A/66 


32 


33 


36 


29 


30 


37 


33 


31 


31 


38 





48 





50 





55 





44 





45 





56 





50 





47 





47 





58 


A/67 


32 


33 


37 


29 


31 


38 


34 


31 


32 


38 





48 





49 





55 





43 





46 





57 





51 





46 





48 





57 


A/68 


33 


34 


37 


30 


31 


39 


34 


31 


32 


39 





49 





50 





54 





44 





46 





57 





50 





46 





47 





57 


A/69 


33 


35 


38 


30 


32 


40 


34 


31 


32 


40 





48 





51 





55 





43 





46 





58 





49 





45 





46 





58 


A/70 


34 


36 


39 


31 


32 


40 


34 


31 


32 


41 





49 





51 





56 





44 





46 





57 





49 





44 





46 





59 


A/71 


34 


36 


40 


31 


33 


40 


35 


32 


33 


41 





48 





51 





56 





44 





46 





56 





49 





45 





46 





58 


A/72 


35 


37 


40 


32 


34 


40 


35 


33 


33 


41 





49 





51 





56 





44 





47 





56 





49 





46 





46 





57 


A/73 


36 


38 


40 


32 


35 


40 


36 


34 


33 


41 





49 





52 





55 





44 





48 





55 





49 





47 





45 





56 


M74 


37 


38 


40 


32 


36 


40 


37 


35 


34 


41 





50 





51 





54 





43 





49 





54 





50 





47 





46 





55 


M75 


37 


39 


41 


32 


36 


41 


37 


35 


35 


42 





49 





52 





55 





43 





48 










49 





47 





47 





56 


M76 


38 


40 


41 


33 


37 


41 


37 


35 


36 


42 





50 





53 





54 





43 





49 





54 





49 





46 





47 







M77 


38 


41 


41 


33 


38 


42 


38 


35 


36 


43 





49 





53 





53 





43 





49 










49 





45 





47 





56 


M78 


39 


42 


41 


33 


38 


42 


38 


36 


37 


44 





50 





54 





53 





42 





49 





54 





49 





46 





47 





56 


A/79 


40 


43 


41 


34 


38 


42 


39 


37 


37 


44 





51 





54 





52 





43 





48 





53 





49 





47 





47 





56 


A/so 


40 


44 


42 


34 


39 


42 


39 


38 


38 


44 





50 










53 





42 





49 





53 





49 





47 





47 







Mgi 


40 


44 


43 


34 


39 


43 


39 


39 


39 


45 





49 





54 





53 





42 





48 





53 





48 





48 





48 





56 


M82 


41 


45 


43 


34 


40 


43 


39 


39 


40 


46 





50 





55 





52 





41 





49 





52 





48 





48 





49 





56 


M83 


42 


45 


43 


35 


40 


44 


40 


40 


40 


46 





51 





54 





52 





42 





48 





53 





48 





48 





48 





55 


Ms4 


42 


46 


44 


36 


41 


44 


40 


40 


40 


47 





50 





55 





52 





43 





49 





52 





48 





48 





48 





66 



(Cont'd) Ratio of coverage of each point q by members of Mt as Mt expands. 
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Table 3. Changes of Yi2{q,Mt) as Mt expands. For each t, we show the ratings for 
each new member of Mt, the values X12 for this new member, and 112 for the coUection 
Mt up to the inclusion of this new member. 
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(Cont'd) Changes of Yi2{q,Mt) as Mt expands. For each t, we show the ratings for each 
new member of Mt, the values X12 for this new member, and Y12 for the collection Mt up 
to the inclusion of this new member. 
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(Cont'd) Changes of Yi2{q,Mt) as Mt expands. For each t, we show the ratings for each 
new member of Mt, the values X12 for this new member, and F12 for the collection Mt up 
to the inclusion of this new member. 
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